Corpus-Assisted Expansion of Manual MT Knowledge
نویسندگان
چکیده
Since the expansion of MT knowledge is currently being performed by humans, it is taking too long and is too expensive. This paper proposes a new procedure that expands MT knowledge efficiently by supporting human judgements with information automatically collected from any number of corpora. The new procedure uses the source knowledge present in an MT system as the key to retrieve source language information from corpora. It also uses the partial translations provided by the MT to acquire target language information. These two techniques can reduce time and labor costs. Experimental results confirm both benefits.
منابع مشابه
The Assessment of Pragmatic Knowledge in the Online General IELTS-Practice Resources: A Corpus Analysis of Writing Tasks
Motivated by the concept of Communicative Language Ability and the eminence of the IELTS exam, this study intended to scrutinize the representation of functional knowledge (FK) and socio-linguistic knowledge (SK) as sub-components of pragmatic knowledge in the writing performances of both tasks of the online General IELTS-practice resources across three band scores. This quantitative inter-scor...
متن کاملAutomatic Construction of Translation Knowledge for Corpus-based Machine Translation
Many machine translation (MT) systems that utilize the knowledge automatically acquired from bilingual corpora have been proposed in conjunction with efforts to accumulate corpora. We call this approach corpus-based machine translation in this thesis. This thesis focuses on automatic construction of the translation knowledge needed for corpus-based MT and discusses the following three tasks. 1....
متن کاملAnnotating Named Entities in Consumer Health Questions
We describe a corpus of consumer health questions annotated with named entities. The corpus consists of 1548 de-identified questions about diseases and drugs, written in English. We defined 15 broad categories of biomedical named entities for annotation. A pilot annotation phase in which a small portion of the corpus was double-annotated by four annotators was followed by a main phase in which ...
متن کاملPE2rr Corpus: Manual Error Annotation of Automatically Pre-annotated MT Post-edits
We present a freely available corpus containing source language texts from different domains along with their automatically generated translations into several distinct morphologically rich languages, their post-edited versions, and error annotations of the performed post-edit operations. We believe that the corpus will be useful for many different applications. The main advantage of the approa...
متن کاملAutomated Corpus Analysis and the Acquisition of Large, Multi-Lingual Knowledge Bases for MT
Although knowledge-based MT systems have the potential to achieve high translation accuracy, each successful application system requires a large amount of hand-coded knowledge (lexicons, grammars, mapping rules, etc.). Systems like KBMT-89 and its descendants have demonstrated how knowledge-based translation can produce good results in technical domains with tractable domain semantics. Neverthe...
متن کامل